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Abstract 

Since the length of microblog texts, such as tweets, is 
strictly limited to 140 characters, traditional Informa¬ 
tion Retrieval techniques suffer from the vocabulary 
mismatch problem severely and cannot yield good per¬ 
formance in the context of microblogosphere. To ad¬ 
dress this critical challenge, in this paper, we propose 
a new language modeling approach for microblog re¬ 
trieval by inferring various types of context informa¬ 
tion. In particular, we expand the query using knowl¬ 
edge terms derived from Freebase so that the expanded 
one can better reflect users’ search intent. Besides, in 
order to further satisfy users’ real-time information 
need, we incorporate temporal evidences into the ex¬ 
pansion method, which can boost recent tweets in the 
retrieval results with respect to a given topic. Exper¬ 
imental results on two official TREC Twitter corpora 
demonstrate the significant superiority of our approach 
over baseline methods. 


Introduction 

Information Retrieval (IR) in the microblogosphere 
such as Twitter 0 has attracted increasing research at¬ 
tention along with the fast development of social media. 
To explore the information seeking behavior in microbl- 
ogoshpere, TREC first intro duced a Real-Time Search 
Task (RTST) in 2011 (Ounis et al. 2012), which can be 
summarized as “At time T, give me the most relevant 
tweets about topic X”. 

However, it is inherently challenging to develop an ef¬ 
fective real-time IR platform in the context of microbl¬ 
ogosphere. First, in contrast to traditional web search 
techniques, real-time search task usually faces the prob¬ 
lem of severe vocabulary mismatch. Since the tweets 
are very short, there is a large risk that query terms 
fail to match any word observed in relevant tweets. 
This problem is extremely severe especially when people 
search the entities with several alternative aliases. Be¬ 
sides, real-time search usually indicates the information 
need of something happening right now. Thus, it is very 
crucial for the IR approach to favor the recent tweets 
relevant to the given topic. This real-time information 
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need requires search engines to trade off between the re¬ 
cency and relevance score computed between the query 
and tweet. 

Query Expansion (QE) meth ods based on pseudo- 
relevance feedback (PRF) dLiang, Qiang, and Yang 
20121 Lv and Zhai 2009 |Zhai ana Lafferty 2001a|) are 
widely used in microbiog search to mitigate the prob- 
lems mentioned above. However, these methods rely 
much on the assumption that the top ranked documents 
in the initial search are relevant and contain good words 
for query expansion. Nevertheless, in real world, this 
assumption does not always hold in microblogos phere 
(Cao et al. 2008 Miyanishi, Seki, and Uehara 2013), 
considering the example that the query contains proper 
nouns difficult to understand. What’s more, even if the 
top ranked documents are highly relevant to the topic, 
it is still very likely that they contain numerous topic- 
unrel ated words due to the info rmality of the tweet con¬ 
tent ( [Miyanishi, Seki,"a nd Uehara 2013). 

To overcome the limitations of existing methods, we 
utilize Freebas^] as the knowledge source to infer more 
topic-related context information for each query. Free- 
base is a practical, scalable tuple database used to orga - 
nize general human knowledge (Bollacker et al. 20081, 
covering a large amount of knowledge in different as- 
pects (domains), into a hierarchical structure. In con¬ 
trast to Wikipedia that describes human knowledge 
with long detailed articles and WordNet that mainly 
contains synonymy relations, Freebase represents the 
human knowledge using an ontological structure (i.e. 
types). Different types, including alias, notable_for and 
description, provide different data views for each spe¬ 
cific concept. In this paper, we propose a knowledge 
query generation method, in which we first match re¬ 
lated concepts in Freebase with respect to the query, 
and then extract useful terms from different proper¬ 
ties of the concepts to generate the knowledge query. 
By interpolating the original query with the knowledge 
query, we can better reflect the users’ information need. 

To further utilize the temporal evidence i n micro blo- 
gosphere, we follow the work of (Li and Croft 2003) and 
incorporate a prior distribution regarding to the recency 
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of documents into the language modeling frameworks. 
More specifically, while selecting top knowledge terms 
from Freebase using an association based method, we 
assign each top ranked pseudo-relevance document with 
a time prior so that the words appearing more in recent 
documents are associated with higher probability. 

The main contributions of this paper include: (1) we 
propose a novel approach to generate knowledge terms 
from Freebase to expand the original query, which leads 
to better understanding of information need; (2) the 
temporal evidence is incorporated into our QE method 
to trade off between relevance and recency; (3) we per¬ 
form a set of experiments on two official twitter test 
collections published by TREC, to compare our pro¬ 
posed method with the state-of-the-art baseline meth¬ 
ods. And, the experimental results demonstrate that 
our proposed approach can give rise to significant bet¬ 
ter retrieval performance. 


Related Work 

PRF-based Query Expansion 

QE methods based on PRF assume that most frequent 
terms in the pseudo-relevance docum ents are usef ul, 
which may not always hold in practice. (Cao et al. 2008) 
then integrated a term classification process to predict 


the effectiveness o f expansion terms. (Miyanishi, Seki, 
and Uehara 20131 proposed a manual tweet selection 
feedback (TSF) to improve the retrieval performance. 
They further used a two-stage PRF based on similar¬ 
ity of temporal profiles of the query and top retrieved 
tweets. However, this method sometimes fails due to the 
content redundancy of tweets, which contain meaning¬ 
less words that may degrade search results. Thus, to im¬ 
prove the retrieval performance more using TSF, they 
suggest to detect important concepts from the feedback 
tweet. 


Knowledge-based Query Expansion 

Several approaches have been proposed to use the exter- 


Net to improve query expansion I 

Collins-Thompson 

and Gallan 2005 Xu, Jones, and Wang 2009; Kotov and 

Zhai 2012 

). (Li et al. 2007 

• Vt— 5 - ~~n — i 

explored the possibilities of 


pand ad-hoc queries and demonstrated that Wikipedia 
especially useful to improve weak queries which PRF is 
unable to improve. In their methods, expansion terms 
were extracted from the top ranked Wikipedia articles. 
To fill the gap between voca bularies used in indexed 


documents and user queries, (Aggarwal and Buitelaar 


2012) used Wikipedia to retrieve the K-best related 


concepts to the query. They utilized Wikipedia and 
and DBPedia to generate the concept candidates, and 
then ranked them according to the semantic relatedness 
score given by the Wiki pedia-based Explicit Sem an¬ 
tic Anal ysis (ES A) ([Gabrilovich and Markovitch 2007). 
(Pan et al. 2013) proposed using Dempster-Shafer’s Ev¬ 
idence Theory to measure the certainty of expansion 


terms from the Freebase structure. To the best of our 
knowledge, query expansion based on Freebase knowl¬ 
edge in microblog search is novel and effective. Unlike 
previous works, our method explored the related con¬ 
cepts in Freebase and attempted to find their aliases 
to solve the vocabulary mismatch problem. Besides, an 
association based term selection method is adopted to 
select useful expansion terms to better understand the 
users’ search intent. 


Temporal Evidence 

Previous works showe d that te mporal evidence can be 
incorporated into IR jDakka, Gravano, and Ipeirotis 


2012| Dong et al. 2010|. (Li and Croft 2003) exploited 

a prior distribution regarding to the recency of doc¬ 
uments in the la nguage mo d eling frameworks for re¬ 
trieval. (Liang, Qiang, and Yang 2012) proposed a tem¬ 
poral re-ranking comp onent to evaluate the tem poral 
aspects of documents. (Efron and Golovchinsky 2011a) 
proposed IR methods using temporal property in ian- 
guage modelin g and sh owed t heir effectiveness for re¬ 
cency queries. (Miyanishi, Seki, and Uehara 2013) as¬ 
sumed that similar temporal models share similar tem¬ 
poral property and proposed a q uery-document depen¬ 
dent te mporal relevance model. (Albakour, Macdonald, 
and Ounis 2013) introduced a decay factor to balance 
the short-term and long-term interests for a given topic. 
In our study, the temporal evidence is well incorporated 
in the expansion method in order to enhance the impor¬ 
tance of the words those are often used to describe the 
concept recently. 


Proposed Methods 

Given the RTST, we assume that a query Q is obtained 
as a sample from a generative model 0q , while the doc¬ 
ument D is generated by model Op. If 9q and 9d are 
the estimated query and docu ment lan guage m odel re¬ 
spectively, according to (Lafferty and Zhai 2001), the 
relevance score of D with respect to Q can be computed 
by the following negative KL-divergence function: 


S(Q, D) = -D(9 q \\ 9 d ) oc ^2 P(w\9q) • log P(w\8 D ) 

wEV 

(i) 

Within this ranking formula, the retrieval problem is 
essentially equivalent to the problem of estimating 9q 
and 9[>. In principle, we can use any language model 
for the query and document, which is very flexible. 

The start point of our study is to infer more topic- 
related context for the query with the help of Free- 
base. In this section, we first elaborate on why we 
choose Freebase as our knowledge base. Based on the 
characteristics of Freebase, we describe our proposed 
method of knowledge query generation in detail. Fur¬ 
ther improvements can be obtained by combining the 
knowledge-based query expansion with model-based 
pseudo-relevance feedback method. 







































Why We Choose Freebase 

Freebase is a large collaborative knowledge base con¬ 
sisted of data harvested from sources such as the Se¬ 
mantic Web and Wikipedia, as well as individually 


contributed data from community members (Bollacker, 
Cook, and Tufts 2007). In Freebase, human knowledge 


is described by structured categories, which are also 
known as types and each type has a number of defined 
properties. In this way, Freebase merges the scalability 
of structured databases with the diversity of collabora- 
tive w ikis in to a st ructured general human knowledge 
(Bollacker et al. 2008). Just as properties are grouped 
into types, types themselves are grouped into domains. 
Domains can be considered as the sections in your fa¬ 
vorite newspaper: Business, Life Style, Arts and Enter¬ 
tainment, Politics, Economics, etc. 

Table [T] shows some topic types of Freebase concept 
“Mila Kunis” in common domain. As we can see from 
it, most types provide us useful information to under¬ 
stand the concept “Mila Kunis” and thus can be used as 
knowledge context of the original concept. This struc¬ 
tured knowledge shows two superiorities compared with 
the semi-structured or plain contents: 


1. When searching in the Freebase (with API), different 
types can be integrated for a more accurate concept, 
and some types such as name and alias are more im¬ 
portant; 


2. When generating knowledge terms, we can treat dif¬ 
ferent types and the corresponding properties as dif¬ 
ferent evidence sources. 


Table 1: Common topic types of Freebase concept “Mila 
Kunis”. _ 


Type 

Property 

name 

Mila Kunis 

alias 

Milena Markovna Kunis, ... 

notable for 

Actor 

notable types 

Celebrity 

description 

Milena Markovna is an American ac¬ 
tress and voice artist. In 1991, at 
the age of seven, she moved from 
the Soviet Union to Los Angeles with 
her family ... [Summary Description 
From Wikipedia] 


Moreover, unlike Twitter including many meaning¬ 
less and topic-unrelated terms (See terms used in some 
top retrieved tweets for TREC topics in Table [2]), the 
terms used in Freebase is always quite formal and se¬ 
mantically related with the specific concept. Thus, we 
assume that the utilization of knowledge terms for 
query expansion can be more effective to improve the 
overall retrieval performance. 


a given query. The basic procedures of our proposed 
method include: 

• Concept Match. We select the topic-related con¬ 
cepts with the help of Freebase API. Taking the query 
“Mila Kunis in Oz Movie” (MB141) as an example, 
we match two concepts “Mila Kunis” and “The Wiz¬ 
ard of Oz” in Freebase. 

• Term Selection. Freebase describes the human 
knowledge of a given concept using types and prop¬ 
erties. For some important meta types such as alias, 
name, notable_for and notable_types in common do¬ 
main, we directly add terms from these corresponding 
properties to the knowledge query. For other types 
(i.e. description and domain specific types), we adopt 
an association based term selection method to extract 
the topic-related top K terms. 

Taking the concept “Mila Kunis” as an example, we 
can conduct the term selection methods to gain the 
top knowledge words “oz, great, power” from de¬ 
scription property, and directly add knowledge terms 
“celebrity actor milena markovna kunis” from meta 
properties. 

Then, we view the selected knowledge terms from all 
related concepts equally to form a new knowledge query 
Qfb- After that, the knowledge query model 0Q fb is in¬ 
terpolated with the original query model 9q: 

p O|0Qi) = (1 - a) • P{w\6q) + a ■ P(w\9 Qfb ) (2) 

where a £ [0,1] is the weighting parameter to control 
the influence of the knowledge query. Both 9q and 0 q / 6 
are estimated according to the maximum likelihood es¬ 
timator. 


Concept Match We then describe our concept 
match algorithm in detail, which can be concluded as 
two steps: 


1. Noun Phrase Detection. For a given query Q, we 
first split Q by space and receive a seq uence o f words 
< 7 i, (/ 2 , • • • q n . Part-of-speech Tagging (Roth and Ze- 


lenko 1998) is then performed on each word, and 


all "tire noun ph rases are extracte d with rule-based 
method (Bird, Klein, and Loper 2009) from the orig¬ 
inal query. 


2. Maximum Match. For each noun phrase, we re¬ 
gard it as a new query and get the related concepts 
as described in Algorithm |T] The FreebaseSearch 
function searches the given query in the Freebase and 
returns the top ranked concept if found. The match 
process ends if a related concept is found or none of 
the separate words can find a match. For the sake 
of efficiency, we can use a hash map which records 
the searched substrings to avoid duplicating call of 
FreebaseSearch. 


Generation of Knowledge Query 

We generate the knowledge query based on types, aim¬ 
ing at extracting terms from different properties for 


Term Selection For each returned concept from 
Freebase API, different types and corresponding prop¬ 
erties which reflect the different aspects of the concept 















Table 2: Top retrieved tweets for TREC topics. 


Topic No. 

Topic 

Relevant Tweet Example 

MB071 

Australian Open Djokovic 
vs. Murray 

Tomorrow is the Australian open tennis final for men, Andy Murray 
vs. Navok Djokovic Whos gonna win?? Im a Murray fan so I say GO 
MURRAY!! 

MB115 

memories of Mr. Rogers 

“@MellowAnniston: Happy late Birthday to Mr. Rogers! ” Omg, Mr. 
Rogers and I have the same bday? Lol 

MB141 

Mila Kunis in Oz movie 

Aw new Oz movie why you go make Mila Kunis ugly, why sir WHY?! 

MB150 

LTK wine industry 

Wine, grape industry accounts for $6.8bn in Canadian economy: Re¬ 
port: Wine and grape industry in Canada accoun ... 


Algorithm 1 GetConcept(NQ) 

Input: 

Noun Phrase Query NQ = q±q 2 ■ ■ • q n - 

Output: 

Candidate Concept Set CSet. 

1 : CSet = FreebaseSearch(TVQ) 

2: if CSet is empty then 

3: if n == 1 then 

4: return 0 

5: end if 

6: NQi 4- qiq 2 ...q n -i 

7: NQ 2 4— <72'73---<Zn 

8: CSeti 4— GetConcept(A<5 1 ) 

9: CSet 2 t— GetConcept(A<52) 

10: return CSet\ U CSet 2 

11: else 

12: return CSet 

13: end if 


are provided by the search result. Some types (i.e. meta 
types) are very general and precise, such as alias, name, 
notable_for and notable_types in the common domain, 
we directly add the property terms to the knowledge 
query for these types. When it comes to other types 
such as description and domain specific ones with long 
texts, an association based term selection method is 
utilized to extract the topic-related knowledge terms. 
Effective term selection is an important issue for an 
automatic query expansion technique. In microblog re¬ 
trieval, a good expansion term should satisfy the fol¬ 
lowing criteria: 

1. The term should be semantically associated with the 
concept from the original query; 

2. The term extracted from Freebase should also be 
widely adopted in the Twitter corpus while talking 
about the concept; 


association based method on the basis of the top ranked 
N pseudo-relevance documents (PRD): 

n 

Score(w ) = P(D) • P(w\D) ■ n p fei^) (3) 

DePRD i=1 


where P{D) is the document prior which is usually as¬ 
sumed to be uniform, and niLi^fel^) is the query 
likelihood given the document model, which is tradi¬ 
tionally computed using Dirichlet smoothing. To meet 


the third criterion, we follow the work of (Li and Croft 


2003) and incorporate the temporal evidence into the 


document prior in Eq|3]by using an exponential distri¬ 
bution: 


P{D\T d ) = r ■ e~ r{TQ - TD) 


(4) 


where r is the exponential parameter that controls the 
temporal influence, Tq is the query issue time and To 
is the tweet post time. Both Tq and To are measured in 
fractions of days. Note that To is constantly less than 
Tq as we cannot use the future evidence. 

Finally, we select the top scored K words from the 
common description and domain specific properties, to 
form the knowledge query Qfb along with the terms 
extracted from meta properties. These K words along 
with the ones from meta properties are treated equally 
and combined to form the knowledge query Qfb- 


Mixture Feedback Model 

With the knowledge query environment, we believe the 
information need is more understandable, which could 
lead to a high precision in top retrieved tweets. Based on 
this hypothesis, we further utilize a model-based feed¬ 
back to update the query representation. More specifi¬ 
cally, we update the 9q 1 with the simple mixture model 


9p which is widely used in microblog retrieval 

Zhai and 

Lafferty 2001a| Liang, Qiang, and Yang 2012 

. 


3. As the user’s intent may change and events related to 
the given topic will develop over the time, the ranking 
function should favor the short-term words that are 
mostly used in recent tweets. 

The candidate terms extracted from Freebase meet the 
first criterion to some extent. In order to satisfy the 
second criterion, we score the candidate terms with an 


p (w\0q 2 ) = (1 - 0) ■ P(w\9 Ql ) + /3 • P(w\ 9 Qf ) (5) 

where j3 € [0,1] is a weighting parameter to control the 
amount of model-based feedback. 

The model-based feedback model generates a feed¬ 
back document by mixing the query topic model 9p 
with the collection language model 9c- Under this sim- 























pie mixture model, the log-likelihood of feedback docu¬ 
ments F is: 


log P(F\9p) = ^ ~2c(w,F )• 

W 

log((l - A) • P(w\e F ) + A • P(w\6 c )) (6) 


where c(w, F ) is the count of word w occurred in the 
set of feedback docum e nts F . Then we follow the work 
of (Zhai and Lafferty 2001a) and implement the EM al¬ 
gorithm with the fixed smoothing parameter A = 0.5. 
No matter whether or not the query finds its knowl¬ 
edge terms in Freebase, the query environment will be 
updated by the model-based feedback. 


Evaluation 
Experimental Setup 

In this section, we describe the experimental dataset 
and evaluation methods which are adopted in TREC 


Microblog Track flOunis et al. 201 2 Soboroff, Ounis, 
and Lin 20131 |Lin and Efron 20141). In addition, base 

ffi 


lines are set up to estimate the effect of the proposed 
methods. Notations and abbreviations that appear in 
our experiments are given in Table [3j 


Table 3: Abbreviations of Experimental Systems. 


Abbreviation 

Description 

SimpleKL 

Simple KL-divergence retrieval 
model without query expansion and 
document expansion. 

QESMM 

KL-divergence retrieval model with 

model-based feedback (Zhai and Laf- 

ferty 2001a). 

QEWiki 

KL-divergence retrieval model with 

query model 9q 1 , and expansion 
terms are derived from top retrieved 
Wikipedia articles. 

RTRM 

Real-time ranking model proposed in 

(Liang, Qiang, and Yang 20121, using 

a two-stage query expansion method 
and gaussian function based tempo¬ 
ral re-ranking with ranking position 
profile. 

QEFB 

KL-divergence retrieval model with 
query model 9q 1 , and terms are de¬ 
rived from both description property 
and meta properties in Freebase. 

QEFBNT 

KL-divergence retrieval model with 
query model 9q 1 without tempo¬ 
ral prior while selecting knowledge 
terms. 

QEManualFB 

The same as QEFB except that the 
Freebase concepts are manually se¬ 
lected. 

QEFB+SMM 

KL-divergence retrieval model with 
query model 9q 2 . 


Data Set Two corpora (i.e. Tweetsll and Tweetsl3 
collection) are used in our experiments. Instead of dis¬ 
tributing the microblog corpus via physical or direct 
downloading, TR EC o rga nizers r elease a streaming API 
Pjto participants (Lin and Efron 2014). Using the offi¬ 
cial API, we crawled a set of local copies of the canoni¬ 
cal corpora. Tweetsll collection has a sample of about 
16 million tweets, ranging from January 24, 2011 to 
February 8, 2011 while Tweetsl3 collection contains 
about 259 million tweets, ranging from February 1, 
2013 to March 31, 2013. In addition, we also crawled 
all the shortened URLs contained in Tweetsll and 
Tweetsl3 Corpora, and inferred their topic informa¬ 
tion (i.e. title of the crawled webpage) to enrich the 
original tweets. In particular, we consider the title in¬ 
formation of the embedded URLs as the local context 
of the original tweets and combine it with the original 


tweets to form the tweet language model (Liang, Qiang, 


and Yang 20121. Tweetsll is used to evaluate the ef¬ 


fectiveness of the proposed real-time Twitter search 
systems over 50 official topics (MB001-MB050) in the 
TREC’ll Microblog track as well as 60 official topics 
(MB051-MB110) in the TREC’12 Microblog track, re¬ 
spectively And, Tweetsl3 is used in evaluating the 
proposed real-time Twitter search systems over 60 offi¬ 
cial topics (MB111-MB170) in the TREC’13 Microblog 
track. In our experiments, TREC’ll topics are used for 
tuning the parameters and then we use the best parame¬ 
ter settings to evaluate our methods with TREC’12 and 
TREC T3 topics. 


The tweets and their corresponding topic informa¬ 
tion were preprocessed in several ways. We first dis¬ 
carded the non-English tweets using a language detec¬ 
tor with infinity-gram, named Idig rl Second, in confor¬ 
mance with the track’s guidelines, all simple retweets 
were removed by deleting documents beginning with 
the string ‘RT’. Moreover, each tweet was stemmed us¬ 
ing the Porter algorithm and stopwords were removed 
using the InQuery stopwords list. 


Evaluation Metric In TREC Microblog Track, 
tweets were judged on the basis of the defined infor¬ 
mation using a three-point scale (Ounis et al. 2012|: 
irrelevant (labeled as 0), minimally relevant (labeled as 
1), and highly relevant (labeled as 2). The main evalu¬ 
ation metric is Mean Average Precision (MAP) for top 
1000 documents and Precision at N (P@N), which are 
widely used in IR. MAP and P@30 with respect to allrel 
(i.e. tweet set judged as highly or minimally relevant) 
are used in this paper. We also do a query-by-query 
analysis and conduct t-test to determine whether the 
improvements on MAP and P@30 are statistically sig¬ 
nificant. 


3 https: / / github.com/lintool/twitter-tools 
4 The topic numbered MB050 and MB076 has no relevant 
tweets. Therefore, we did not use them for our experiments. 
5 http: //git hub .com/shuyo/ldig 










































Baselines To demonstrate the performance of our 
proposed method, we compare our knowledge-based 
query expansion methods with several baseline meth¬ 
ods. 

(1) The simple KL -divergence retrie val mo del (de¬ 
noted as SimpleKL) ( |Zhai and Lafferty 2001b ) is used 
as our first baseline. That is, we estimate Qq and dp 
with empirical word distribution, and we choose Dirich- 
let smoothing method for document model estimation. 
Throughout this paper, we set the Dirichlet smooth¬ 
ing parameter p = 100, which has been reported for 
a good retrieval performance in microblog retrieval 
(Liang, Qiang, and Yang 2012). 


(2) We us e the Simple Mixture Model (Zhai and Laf¬ 


ferty 2001a I (denoted as QESMM) as our second base¬ 


line, and optimize the number of feedback documents 
to 7 and the number of terms in the feedback model to 
5. The smoothing parameter /3 is set as 0.9. 

(3) QEWiki is a Wikipedia-based query e xpansion 


meth od, which is similar with the work of (Li et al. 
2007). We downloaded a local copy of Wikipedia data 


for faster access and indexed the articles using Lemur 
toolkit (version 4.12). The expansion terms are de¬ 
rived from top ranked Wikipedia articles. In our ex¬ 
periments, we rank Wikipedia articles using language 
model (i.e. SimpleKL), and total 10 terms are picked 
from the top 5 documents. Then we treat the terms as 
a new query and interpolate it with the original query. 
The interpolation parameter a in Eqj2]for QEWiki is 
set as 0.4. 

(4) We also compare our method with the state-of- 
the-art real-time ranking model (denoted as RTRM) 
un der langu age mo deling fr amework, proposed by 
(Liang, Qiang, and Yang 2012). RTRM approach also 
utilized a two-stage pseudo-relevance feedback query 
expansion to estimate the query language model. Be¬ 
sides, RTRM adopts a temporal re-ranking component 
to evaluate the temporal aspects of tweets. 

We tune all the parameters of these models with 
TREC’ll topics on Tweetsll corpus. 


Experimental Results 

We conduct several experiments to measure the effects 
of our query expansion methods. For our knowledge- 
based query expansion method, we label the method 
with query model 9q 1 as QEFB, and the one with 9q 2 
as QEFB+SMM. When selecting knowledge terms 
from Freebase description and domain specific (e.g. 
Business domain) properties, we set the top ranked 
PRD number N to 100 and the expanded term number 
K to 5. The exponential parameter r for temporal prior 
is set as 0.1. a in Eqj2] is set as 0.5, which means we re¬ 
gard the original query and the knowledge query equally 
important. The query expansion parameters in the mix¬ 
ture feedback model are set like QESMM except that 
the interpolation /3 is set as 0.6. All the parameters are 


6 lit t p: / / www. lemurpro j ect.org/lemur. php 


tuned with TREC’ll topics. Then we test the optimized 
models with TREC’12 and TREC’13 topics. 

Table [4] shows the performance comparison of dif¬ 
ferent query expansion methods. For statistical signif¬ 
icance, we used a paired t-test. f, |, and § indi¬ 
cate that the corresponding improvements over Sim¬ 
pleKL, QESMM, QEWiki and RTRM are statisti¬ 
cally significant {p < 0.05), respectively. Note that all 
the methods listed in the table estimate the document 
model as SimpleKL. As we can see, all of the query 
expansion methods have significant MAP and P@30 
improvements compared with the SimpleKL method, 
which indicates the effectiveness of query expansion 
in microblog retrieval. Besides, QEFB performs bet¬ 
ter than the Wikipedia-based query expansion method 
QEWiki. This shows the superiority of our Freebase- 
based query expansion method and demonstrate the ef¬ 
fectiveness of the structured data. 

When the query is expanded with the Freebase 
knowledge query, our approach can retrieve more rel¬ 
evant documents in the top results. Thus, we can fur¬ 
ther improve the retrieval performance by combining 
the knowledge-based expansion method with mixture 
feedback model. Our knowledge-based query expan¬ 
sion method QEFB+SMM achieves the best retrieval 
performance in the three topic sets with respect to 
both MAP and P@30 metrics. More specifically, for 
TREC’12 topics, our method QEFB+SMM improves 
the MAP over SimpleKL and QESMM by 23.80% 
and 12.35%, respectively; while the corresponding in¬ 
crements in terms of P@30 are 14.91% and 11.56%, 
respectively. For TREC’13 topics, the QEFB+SMM 
raises the MAP over SimpleKL and QESMM by 
24.81% and 12.02%,respectively; while the correspond¬ 
ing P@30 improvements are 16.42% and 12.37%, respec¬ 
tively. Moreover, Our method also beats the state-of- 
the-art baseline RTRM, which uses a two-stage query 
expansion method. 

To further demonstrate the effectiveness of our pro¬ 
posed method, we also compare our QEFB+SMM 
with the top three automatic runs in TREC 2012 and 
2013 Microblog track. Table [5] shows the MAP and 
P@30 performances of all these runs. Note that for 
TREC’12, the ran king scores are computed w ith respect 
to the highrel set (Soboroff, Ounis, and Lin 2013); while 
for TREC T3, th e sco res are computed in the allrel set 
(Lin and Efron 2014). From the table, we can observe 
that our system is comparable with the top three runs 
in TREC Microblog track. Moreover, QEFB+SMM 
even beats the best automatic run in TREC’13 with 
respect to both evaluation metrics. 


Discussion 

Many parameters in our proposed approach can affect 
the system performance. In this section, we analyze 
the robustness of the parameter settings in knowledge- 
based query expansion method. All these experiments 
in this section are run on TREC’ll topics, which are 
used for parameter selection. 
















Table 4: The performance comparison of different query expansion methods. The best performances are marked in 
bold. _ 


Topics 

TREC’ll 

TRECT2 

TRECT3 

Method 

MAP 

P@30 

MAP 

P@30 

MAP 

P@30 

SimpleKL 

0.3645 

0.3850 

0.2727 

0.3938 

0.2926 

0.4939 

QESMM 

0.3957 

0.4218 

0.3005 

0.4056 

0.3260 

0.5117 

QEWiki 

0.4041 

0.4177 

0.3175 

0.4203 

0.3099 

0.5111 

RTRM 

0.4226 

0.4463 

0.3250 

0.4458 

0.3507 

0.5406 

QEFB 

0.4289f 

0.4252 

0.3198f 

0.4333 

0.3149f 

0.5117 

QEFB+SMM 

0.4369fft 

0.4497ft 

0.3376ft 

0.4525fft 

0.3652fft 

0.5750ftt§ 


Table 5: The performance comparison of our 
QEFB+SMM with TREC best runs. The best 
performances are marked in bold. 


Topics 

TREC’12 

TREC’13 

Method 

MAP 

P@30 

MAP 

P@30 

1st run 

0.2642 

0.2701 

0.3524 

0.5528 

2nd run 

0.2411 

0.2446 

0.3506 

0.5544 

3rd run 

0.2093 

0.2384 

0.3494 

0.5372 

QEFB+SMM 

0.2415 

0.2429 

0.3652 

0.5750 




Figure 1: Sensitivity to the selected knowledge term 
number K. 


Effects of Knowledge Query For the query mod¬ 
eling, we propose using knowledge query to make the 
information need more comprehensible. Many factors 
affect the quality of the knowledge terms: (1) whether 
the maximum match algorithm can get topic-related 
concept from the Freebase; (2) the number of knowl¬ 
edge terms K and (3) the number of pseudo-relevance 
documents N used for term selection. 

To answer the first question, we create the run QE- 
ManualFB, which means we manually select the con¬ 
cept from Freebase for each query. The interpolation 
parameter a for all these models are set as 0.5. Figure 
CO shows the MAP and P@30 scores of all the models 
for different K and fixed N = 100. In particular, MAX 
means all the candidate terms that satisfy Score(w ) > 0 
in Eq|3]are selected. We can see that though QEFB is 
not better than QEManualFB , the performance gap 
between them is not large, which verifies the effective¬ 
ness of our concept match algorithm. Moreover, when I\ 
is set around 5, QEFB can get its optimal retrieval per¬ 
formance and is significantly better than that of Sim- 
pleKL, which indicates the effectiveness of the associ¬ 
ation based term selection method. 

To further show the parameter sensitivity to the PRD 
number for term selection, we fix the term number I\ 
as 5 and change the PRD number N. Figure 2] shows 
the MAP and P@30 scores of our QEFB model against 
different values of N. It is readily apparent that QEFB 
can achieve its optimal performance when N is set to 
100. That is, top 100 pseudo-relevance documents can 
provide adequate information for selecting good knowl¬ 
edge terms from Freebase description and domain spe¬ 
cific properties. 



Figure 2: Sensitivity to the PRD number N for knowl¬ 
edge term selection. 


Effects of Temporal Evidence In the previous 
work (Efron and Golovchinsky 2011b), it was shown 
that the selection of the rate parameter r for the expo¬ 
nential distribution when applying temporal prior has 
a strong effect on retrieval. In the previous sections, we 
set r in Eqi as 0.1. Now, we want to verify the effect 
of the temporal evidence in our expansion methods. 

In our method, the temporal prior affects the knowl¬ 
edge terms selected from the Freebase properties. A 
large r favors the terms that are used recently in the 
pseudo-relevance documents. For better comparison, we 
create a run named QEFBNT ignoring the temporal 
evidence. Figure 3] shows the P@N scores of QEFB 
with different values of r. Only four values of r are 
shown here, although more were tried. 

We can observe from the figure that an appropriate 
r can improve the retrieval performance compared with 
the QEFBNT in terms of P@N. Besides, a large r can 
highly improve the precision of top retrieved tweets. 
Note that QEFB (r = 0.5) has maximum P@1 and 
P@5 scores compared with other settings. However, it 

























































N 

Figure 3: Sensitivity of the QEFB model to the expo¬ 
nential rate parameter r. 



Figure 4: Sensitivity to the first-stage knowledge query 
expansion coefficient a. 

second-stage expansion seems to be more robust and 
constantly better than QEFB with respect to P@30. 
After knowledge-based query expansion, the query can 
be more comprehensible and get more top related 
tweets, which leads to further improvement with tradi¬ 
tional model-based feedback. However, when it comes 
to the MAP metric, the performance of QEFB+SMM 
drops when (3 is larger than 0.3. Finally, we choose 
/3 = 0.6 which is a tradeoff between MAP and P@30. 



does not show any superiority over other models with 
respect to the P@N (N > 10) scores. In fact, the MAP 
score of QEFB (r = 0.5) is also lower than QEFB 
with a small r. A rational explanation for this inter¬ 
esting phenomenon may be that, with more short-term 
words, more tweets with higher relevance can be re¬ 
trieved easily and thus the precision of top ranked 5 
tweets is boosted. But at the same time, more irrele¬ 
vant tweets in top 30 documents could be retrieved as 
these terms overemphasize the recency. 

Taking the query “water shortage” (MB111) as an 
example, the top knowledge words of QEFBNT are 
“affect, global, area”. For the QEFB of r = 0.5, the 
top words are “africa, drought, play”. This indicates 
that people mainly focus on drought in Africa recently 
when they are talking about the water shortage. In our 
system, we finally choose QEFB (r = 0.1) which has 
both high and stable P@N (1 < N < 30) and MAP 
scores. 

Effects of the Interpolation Coefficients Recall 
that we first expand the query with knowledge query, 
and further expand the updated query 9q 1 with model- 
based feedback. The first-stage query expansion is con¬ 
trolled by a coefficient a, while the second-stage ex¬ 
pansion is controlled by j3. Figure [3] shows the perfor¬ 
mance variance of QEFB (N = 100, K = 5) against 
different values of a. When a = 0, QEFB degenerates 
into the baseline method SimpleKL. When a = 1, 
we completely ignore the original query and only use 
the knowledge query. We can observe that the perfor¬ 
mance of QEFB is better than SimpleKL when a is 
no greater than 0.7. The optimal performance can be 
obtained when a is set around 0.5. 

Figure [5] shows the performance variance of 
QEFB+SMM against different values of /3. When /3 = 
0, the QEFB+SMM degenerates into QEFB. The 
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Figure 5: Sensitivity to the second-stage mixture feed¬ 
back interpolation coefficient /?. 

Conclusion and Future Work 

In this study, we proposed using knowledge-based query 
expansion to solve the problems in microblog search. 
With the knowledge terms derived from the Freebase, 
the queries in microblogosphere can be more compre¬ 
hensible and thus more relevant documents can be re¬ 
trieved. The knowledge terms from Freebase should co¬ 
occur with query terms in PRD, which has the potential 
to alleviate the topic drift induced by knowledge-based 
QE. Freebase’s structured information is well utilized 
in knowledge query generation procedure. Moreover, 
we incorporated the temporal evidence into query rep¬ 
resentation. Thus the proposed method favors recent 
tweets which satisfies the real-time information need in 
microblog retrieval. Our thorough evaluation, using two 
standard TREC collections, demonstrates the effective¬ 
ness of the proposed method. 

Many studies remain for the future work. One of the 
most interesting directions is to explore more compli¬ 
cated algorithms to explore the domain information of 
Freebase. By further analyzing the domain information 
of the concepts for a given query, we can also assign 
the retrieved tweets to different domains, which can 















































be used to generate a structural result representation. 
Moreover, we can classify the queries into two categories 
as temporal-dependent and temporal-independent ones, 
and use different strategies to estimate temporal evi¬ 
dence for each category. 
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